Skip to content

Conversation

@Shekharrajak
Copy link
Contributor

@Shekharrajak Shekharrajak commented Nov 13, 2025

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

@mbutrovich
Copy link
Contributor

mbutrovich commented Nov 13, 2025

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

@andygrove
Copy link
Member

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

@Shekharrajak
Copy link
Contributor Author

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

Thanks! Added in commit 8eddd29

@Shekharrajak
Copy link
Contributor Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

Added tests 987b646

@Shekharrajak
Copy link
Contributor Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

How can we check if it is not falling back to Spark's JVM execution? @andygrove

@wForget wForget changed the title Support for StringSplit feat: Support for StringSplit Nov 17, 2025
@Shekharrajak Shekharrajak force-pushed the feature/add-string-split-support branch from dbb34d5 to 1f8f2b2 Compare November 17, 2025 18:52
@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 15.38462% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.98%. Comparing base (f09f8af) to head (2199b5f).
⚠️ Report is 783 commits behind head on main.

Files with missing lines Patch % Lines
...rc/main/scala/org/apache/comet/serde/strings.scala 8.33% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2772      +/-   ##
============================================
- Coverage     56.12%   54.98%   -1.15%     
- Complexity      976     1329     +353     
============================================
  Files           119      167      +48     
  Lines         11743    15472    +3729     
  Branches       2251     2559     +308     
============================================
+ Hits           6591     8507    +1916     
- Misses         4012     5739    +1727     
- Partials       1140     1226      +86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kazuyukitanimura
Copy link
Contributor

Thanks @Shekharrajak
Looks like there are rust check failures
https://github.com/apache/datafusion-comet/actions/runs/19441578149/job/55638326879?pr=2772

Perhaps you can try cargo fix?

@Shekharrajak
Copy link
Contributor Author

Perhaps you can try cargo fix?

I ran but I am not sure why the checks keep failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for StringSplit

7 participants